Grid Binary LOgistic REgression (GLORE): building shared models without sharing data

نویسندگان

  • Yuan Wu
  • Xiaoqian Jiang
  • Jihoon Kim
  • Lucila Ohno-Machado
چکیده

OBJECTIVE The classification of complex or rare patterns in clinical and genomic data requires the availability of a large, labeled patient set. While methods that operate on large, centralized data sources have been extensively used, little attention has been paid to understanding whether models such as binary logistic regression (LR) can be developed in a distributed manner, allowing researchers to share models without necessarily sharing patient data. MATERIAL AND METHODS Instead of bringing data to a central repository for computation, we bring computation to the data. The Grid Binary LOgistic REgression (GLORE) model integrates decomposable partial elements or non-privacy sensitive prediction values to obtain model coefficients, the variance-covariance matrix, the goodness-of-fit test statistic, and the area under the receiver operating characteristic (ROC) curve. RESULTS We conducted experiments on both simulated and clinically relevant data, and compared the computational costs of GLORE with those of a traditional LR model estimated using the combined data. We showed that our results are the same as those of LR to a 10(-15) precision. In addition, GLORE is computationally efficient. LIMITATION In GLORE, the calculation of coefficient gradients must be synchronized at different sites, which involves some effort to ensure the integrity of communication. Ensuring that the predictors have the same format and meaning across the data sets is necessary. CONCLUSION The results suggest that GLORE performs as well as LR and allows data to remain protected at their original sites.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Secure Multi-pArty Computation Grid LOgistic REgression (SMAC-GLORE)

BACKGROUND In biomedical research, data sharing and information exchange are very important for improving quality of care, accelerating discovery, and promoting the meaningful secondary use of clinical data. A big concern in biomedical data sharing is the protection of patient privacy because inappropriate information leakage can put patient privacy at risk. METHODS In this study, we deployed...

متن کامل

Development of a Web Service for Analysis in a Distributed Network

OBJECTIVE We describe functional specifications and practicalities in the software development process for a web service that allows the construction of the multivariate logistic regression model, Grid Logistic Regression (GLORE), by aggregating partial estimates from distributed sites, with no exchange of patient-level data. BACKGROUND We recently developed and published a web service for mo...

متن کامل

به کارگیری مدل‌های رگرسیون لجستیک ترتیبی در مطالعات کیفیت زندگی

 Background & Objectives: Due to the increasing tendency to measure the quality of life in recent years and the extensive quality of life questionnaires, it is important to determine the appropriate method of analyzing data derived from these studies. The aim of the present study was to introduce ordinal logistic regression models as an appropriate method for analyzing the data of quality of li...

متن کامل

Grid multi-category response logistic models

BACKGROUND Multi-category response models are very important complements to binary logistic models in medical decision-making. Decomposing model construction by aggregating computation developed at different sites is necessary when data cannot be moved outside institutions due to privacy or other concerns. Such decomposition makes it possible to conduct grid computing to protect the privacy of ...

متن کامل

Quantitative Structure - Activity Relationships Study of Carbonic Anhydrase Inhibitors Using Logistic Regression Model

Binary Logistic Regression (BLR) has been developed as non-linear models to establish quantitative structure- activity relationships (QSAR) between structural descriptors and biochemical activity of carbonic anhydrase inhibitors. Using a training set consisted of 21 compounds with known ki values, the model was trained and tested to solve two-class problems as active or inactive on the basi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 19  شماره 

صفحات  -

تاریخ انتشار 2012